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Motivation In the novel Hard Times, Charles Dickens described the fictional "Coketown" as 
follows: 

Fact, fact, fact, everywhere in the material aspect of the town; fact, fact, fact, every- 
where in the immaterial. The M'Choakumchild school was all fact, and the school of 
design was all fact, and the relations between master and man were all fact, and every- 
thing was fact between the lying-in hospital and the cemetery, and what you couldn't 
state in figures, or show to be purchasable in the cheapest market and salable in the 
dearest, was not, and never should be, world without end, Amen. 

In real-life business intelligence, facts are of course very important, but opinion also plays a crucial 
role. Consider, for instance, the following scenario. A major computer manufacturer, disappointed 
with unexpectedly low sales, finds itself confronted with this question: 

Why aren't consumers buying our laptop? 

While concrete data such as the laptop's weight or the price of a competitor's model are obviously 
relevant, answering this question requires focusing more on people's personal views of such ob- 
jective characteristics. Moreover, subjective judgments regarding intangible qualities — e.g., "the 
design is tacky" or "customer service was condescending" — or even misperceptions — "updated 
device drivers aren't available" — must be taken into account as well. 

Sentiment-analysis technologies for extracting opinions from unstructured human- authored 
documents would be excellent tools for handling many business-intelligence tasks related to the 
one just described. Continuing with our example scenario: it would be difficult to try to directly 
survey laptop purchasers who haven 't bought the company's product. Rather, we could employ a 
system that (a) finds reviews or other expressions of opinion on the Web — newsgroups, individual 
blogs, and aggregation sites such as epinions.com are likely to be productive sources — and then 
(b) creates condensed versions of the reviews or a digest of the overall consensus. This would 
save the analyst from having to read potentially dozens or even hundreds of versions of the same 
complaints. Note that Internet sources can vary wildly in form, tenor, and even grammaticality; 
this fact underscores the need for robust techniques even when only one language (e.g., English) is 
considered. 

Challenges in sentiment classification Given the multitude of potential applications, researchers 
have been devoting more and more attention to sentiment analysis. Much of the current work is 
devoted to classification problems: determining whether a particular document or portion thereof 
is subjective or not, and/or determining whether the opinion it expresses is positive or negative. 
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At first blush, this might not appear so hard: one might expect that we need simply look for ob- 
vious sentiment-indicating words, such as "great". The difficulty lies in the richness of human 
language use. First, there can be an amazingly large number of ways to say the same thing (es- 
pecially, it seems, when that thing is a negative perception); this complicates the task of finding 
a high-coverage set of indicators. Furthermore, the same indicator may admit several different 
interpretations. Consider, for example, the following sentences: 

• This laptop is a great deal. 

• A great deal of media attention surrounded the release of the new laptop model. 

• If you think this laptop is a great deal, I've got a nice bridge you might be interested in. 

Each of these sentences contains the three words "a great deal", but the opinions expressed are, 
respectively, positive, neutral, and negative. The first two sentences use the same phrase to mean 
different things. The last sentence involves sarcasm, which, along with related rhetorical devices, 
is an intrinsic feature of texts from unrestricted domains such as blogs and newsgroup postings. 

In general, researchers have adopted one of two approaches to meeting the challenges that 
sentiment analysis presents. Many groups are working to directly improve the selection and inter- 
pretation of indicators through the incorporation of linguistic knowledge; given the subtleties of 
natural language, such efforts will be critical to building operational systems. Others have been 
pursuing a different tack: employing learning algorithms that can automatically infer from text 
samples what indicators are useful. Besides being potentially more cost-effective, more easily 
ported to other domains and languages, and more robust to grammatical mistakes, learning-based 
systems can also discover indicators that humans might neglect. For example, in our own work, 
we found that the phrase "still," (comma included) is a better indicator of positive sentiment than 
"good" — a typical instance of use would be a sentence like "Still, despite these flaws, I'd go with 
this laptop". Nevertheless, it bears repeating that incorporating deep knowledge about language 
will be absolutely crucial to developing systems capable of high-quality (as opposed to merely 
high-throughput) sentiment analysis. Both the linguistic and the learning approach have consider- 
able merits; it seems very safe to say that the community will need to turn towards finding ways to 
combine their advantages. 

Related problems, new directions The classification problems discussed above only involve the 
determination of sentiment. However, there is growing interest in capturing interactions between 
subjectivity and subject — we not only need to know what an author's opinion is, but what that 
opinion is about. For example, while in a broad sense a review of a particular laptop is only about 
one topic (the laptop itself), it almost surely discusses various specific aspects of the machine. We 
would ideally like a sentiment-analysis system to reveal whether there are particular features that 
the review's author disapproves of even if his or her overall impression was positive. 

Another interesting research direction of potentially great importance is to integrate into sen- 
timent analysis the notion of the status of an opinion holder, perhaps via adaptation of the hubs- 
and-authorities techniques used in Web search or link-analysis methods in reputation systems. For 
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example, we might want to identify bellwethers — thought leaders with enough influence that oth- 
ers explicitly adopt their opinions — or barometers — those whose opinions are generally held 
by the majority of the population of interest. Tracking the views of these two types of people 
could both streamline and enhance the process of gathering business intelligence to a large degree. 
Surely that sounds like a great deal! 
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